Retrieved from: https://www.bing.com/images/search?view=detailV2&id=EF63E80B990B63015E8D6EED897448BF320E188A&thid=OIP.w_9nO-IKJvbrmAD0cV29XQHaE8&exph=2000&expw=3000&q=shark+attack&selectedindex=7&ajaxhist=0&vt=0&adlt=demote&shtp=GetUrl&shid=af3c16a6-a20c-4cf2-aaed-3cb63afa935e&shtk=RG8gU2hhcmtzIFJlYWxseSBOb3QgTGlrZSBIb3cgSHVtYW5zIFRhc3RlPw%3D%3D&shdk=QXVmIEJpbmcgdm9uIHd3dy50b2RheWlmb3VuZG91dC5jb20gZ2VmdW5kZW4%3D&shhk=TKN%2FYQ3smG9emFvtZPDaFmhe39V9NgOF0fnLL%2Fvqo6c%3D&form=EX0023&shth=OSH.n%252FGuSuLF53reEEGSp9ZQfQ

Retrieved from: https://www.bing.com/images/search?view=detailV2&id=EF63E80B990B63015E8D6EED897448BF320E188A&thid=OIP.w_9nO-IKJvbrmAD0cV29XQHaE8&exph=2000&expw=3000&q=shark+attack&selectedindex=7&ajaxhist=0&vt=0&adlt=demote&shtp=GetUrl&shid=af3c16a6-a20c-4cf2-aaed-3cb63afa935e&shtk=RG8gU2hhcmtzIFJlYWxseSBOb3QgTGlrZSBIb3cgSHVtYW5zIFRhc3RlPw%3D%3D&shdk=QXVmIEJpbmcgdm9uIHd3dy50b2RheWlmb3VuZG91dC5jb20gZ2VmdW5kZW4%3D&shhk=TKN%2FYQ3smG9emFvtZPDaFmhe39V9NgOF0fnLL%2Fvqo6c%3D&form=EX0023&shth=OSH.n%252FGuSuLF53reEEGSp9ZQfQ

Introduction

The purpose of the following report is the analysis of the ‘global shark attacks’ dataset. The dataset was retrieved from http://www.sharkattackfile.net/ on 29.03.2020 and contains current and historical data on shark/human interactions. Our goal is to better understand the behavior of the sharks and to test a few hypotheses.

Data Import

library(readxl)
shark <- read_excel("GSAF5.xls")
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in W1678 / R1678C23: got 'stopped here'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in X4619 / R4619C24: got 'Teramo'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in X6045 / R6045C24: got 'change filename'
## New names:
## * `Case Number` -> `Case Number...1`
## * `Case Number` -> `Case Number...20`
## * `Case Number` -> `Case Number...21`
## * `` -> ...23
## * `` -> ...24
View(shark[1:20,])
str(shark)
## Classes 'tbl_df', 'tbl' and 'data.frame':    25775 obs. of  24 variables:
##  $ Case Number...1       : chr  "2020.02.05" "2020.01.30.R" "2020.01.17" "2020.01.16" ...
##  $ Date                  : chr  "05-Feb-2020" "Reported 30-Jan-2020" "17-Jan-2020" "16-Jan-2020" ...
##  $ Year                  : chr  "2020" "2020" "2020" "2020" ...
##  $ Type                  : chr  "Unprovoked" "Provoked" "Unprovoked" "Unprovoked" ...
##  $ Country               : chr  "USA" "BAHAMAS" "AUSTRALIA" "NEW ZEALAND" ...
##  $ Area                  : chr  "Maui" "Exumas" "New South Wales" "Southland" ...
##  $ Location              : chr  NA NA "Windang Beach" "Oreti Beach" ...
##  $ Activity              : chr  "Stand-Up Paddle boarding" "Floating" "Surfing" "Surfing" ...
##  $ Name                  : chr  NA "Ana Bruna Avila" "Will Schroeter" "Jordan King" ...
##  $ Sex                   : chr  NA "F" "M" "F" ...
##  $ Age                   : chr  NA "24" "59" "13" ...
##  $ Injury                : chr  "No injury, but paddleboard bitten" "PROVOKED INCIDENT  Scratches to left wrist" "Laceration ot left ankle and foot" "Minor injury to lower leg" ...
##  $ Fatal (Y/N)           : chr  "N" "N" "N" "N" ...
##  $ Time                  : chr  "09h40" NA "08h00" "20h30" ...
##  $ Species               : chr  "Tiger shark" NA "\"A small shark\"" "Broadnose seven gill shark?" ...
##  $ Investigator or Source: chr  "K. McMurray, TrackingSharks.com" "K. McMurray, TrackingSharks.com" "B. Myatt & M. Michaelson, GSAF; K. McMurray, TrackingSharks.com" "K. McMurray, TrackingSharks.com" ...
##  $ pdf                   : chr  "2020.02.05.Maui.pdf" "2020.01.30.R-Avila.pdf" "2020.01.17-Schroeter.pdf" "2020.01.16-King.pdf" ...
##  $ href formula          : chr  "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.02.05.Maui.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.30.R-Avila.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.17-Schroeter.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.16-King.pdf" ...
##  $ href                  : chr  "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.02.05.Maui.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.30.R-Avila.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.17-Schroeter.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.16-King.pdf" ...
##  $ Case Number...20      : chr  "2020.02.05" "2020.01.30.R" "2020.01.17" "2020.01.16" ...
##  $ Case Number...21      : chr  "2020.02.05" "2020.01.30.R" "2020.01.17" "2020.01.16" ...
##  $ original order        : chr  "6506" "6505" "6504" "6503" ...
##  $ ...23                 : logi  NA NA NA NA NA NA ...
##  $ ...24                 : logi  NA NA NA NA NA NA ...

Data Cleaning

First we will investigate, clean and prepare every single variable in the dataset. This is an important step because we want to see what kind of data we are dealing with and prepare it accordingly for further analysis.

Analysis and Imputation of Missing Values